Aminet 7

home *** CD-ROM | disk | FTP | other *** search

/ Aminet 7 / Aminet 7 - August 1995.iso / Aminet / docs / misc / ConcNews.lha / news / general.programming / comp.programming_1589_000024.msg < prev next >

Wrap

Text File | 1994-11-27 | 2.7 KB | 76 lines

Path: dd.chalmers.se!news.chalmers.se!sunic!EU.net!howland.reston.ans.net!cs.utexas.edu!swrinde!sgiblab!news.cs.indiana.edu!news.Arizona.EDU!math.arizona.edu!CS.Arizona.EDU!not-for-mail From: dave@CS.Arizona.EDU (Dave Schaumann) Newsgroups: comp.programming Subject: Re: Hash function for strings Date: 4 Mar 1994 17:04:23 -0700 Organization: University of Arizona CS Department, Tucson AZ Lines: 64 Message-ID: <2l8ia7$97d@caslon.CS.Arizona.EDU> References: <amundsj.1.0@novell.oih.no> <2l6kis$8jp@scax18.pki-nbg.philips.de> NNTP-Posting-Host: caslon.cs.arizona.edu In article <2l6kis$8jp@scax18.pki-nbg.philips.de>, Frank Munkert <ln_fmu@sle20.pki-nbg.philips.de> wrote: | |In article <amundsj.1.0@novell.oih.no>, amundsj@novell.oih.no (AMUNDSEN JARLE/3AA/D) writes: |> I am trying to find a good hash function for strings, names that is, and |> wonder if anyone has an idea on how good such a function can hash. Given |> that the keys are not known in advance. | |The performance of several different hash function is discussed |in "Aho, Sethi, Ullman: Compilers - Principles, Techniques and Tools; |Addison-Wesley, Reading, 1986." The hash function they use that performs the best is this (well, modified just a bit): int hash_pjw( char *s, unsigned int size ) { char *p ; unsigned int h = 0, g ; for( p = s ; *p != '\0' ; p++ ) { h = (h << 4) + *p if( (g = h & 0xf0000000) ) { /* note: assumes 32-bit ints */ h = h ^ (g >> 24) ; /* here, too */ h = h ^ g ; } } return h % size ; } I've used this with very good results for prime-sized hash tables and internal chaining. For instance, here's the stats for loading /usr/dict/words into my hash table using hash_pjw: number of rehashes : 9 population : 24483 table size : 55609 load factor : 0.440270 total nodes searched : 73433 total search calls : 52216 Avg nodes searched per call: 1.406331 probes keys % ------ ----- ------ 1 18982 77.53 2 3498 91.82 3 1218 96.79 4 449 98.63 5 182 99.37 6 83 99.71 7 36 99.86 8 15 99.92 9 7 99.95 10 6 99.97 11 5 99.99 12 1 100.00 15 1 100.00 Note that total search calls > population because of rehashing. The hash table starts out at a size of 101, which is probably too small, particularly if your planning on loading /usr/dict/words into the hash table very often :-). I can also get a little better behavior by tweaking the probe function. -- Dave Schaumann dave@cs.arizona.edu